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Abstract 

Current languages for safely manipulating values with names only 
support term languages with simple binding syntax. As a result, no 
tools exist to safely manipulate code written in those languages for 
which name problems are the most challenging. We address this 
problem with Romeo, a language that respects a-equivalence on 
its values, and which has access to a rich specification language 
for binding, inspired by attribute grammars. Our work has the 
complex-binding support of David Herman's A m , but is a full- 
fledged binding-safe language like Pure FreshML. 

Categories and Subject Descriptors D.3.1 [Formal Definitions 
and Theory] 

Keywords languages; binding; alpha-equivalence; macros 

1. Introduction 

Manipulating terms with binding information has traditionally 
posed a serious problem to metaprogrammers. In a survey of 9 
DSL implementations, 8 were found to be prone to variable cap- 
ture [4]. 

Building a system that does not suffer from this problem is dif- 
ficult, because a name in isolation is meaningless; only its relation- 
ship with other names imbues a name with meaning. On the other 
hand, the meaninglessness of names inspires a motto for what it 
means to manipulate names correctly: a system is correct if a- 
conversion of input values results in outputs that are identical up 
to Q-conversion [8]. 

Systems with this property have been created, allowing for 
programming with names but without insidious name problems. 
However, these systems generally support only term languages with 
simple binding structure [5, 12]. Our work is an extension of David 
Herman's macro system that supports complex binding structure 
[7], but as a full-fledged binding-safe language inspired by Pure 
FreshML [13]. 

1.1 Motivation 

For example, consider the following Scheme term, exhibiting a 
complex binding structure: 

Permission to make digital or hard copies of all or part of this work for personal 
or classroom use is granted without fee provided that copies are not made or 
distributed for profit or commercial advantage and that copies bear this notice 
and the full citation on the first page. Copyrights for components of this work 
owned by others than the author(s) must be honored. Abstracting with credit is 
permitted. To copy otherwise, or republish, to post on servers or to redistribute to 
lists, requires prior specific permission and/or a fee. Request permissions from 
permissions@acm.org. 

ICFP'14, September 1-6, 2014, Gothenburg, Sweden. 

Copyright is held by the owner/author(s). Publication rights licensed to ACM. 
ACM 978-1-4503-2873-9/14/09. . .$15.00. 
http://dx.doi.org/10.1145/2628136.2628162 



(let* ((a 1) 

(b (+ a a)) 
(c (* b 5))) 
(display c)) 

In Scheme, the let* syntactic form is defined to bind the names it 
introduces not only in the body, but also in the right hand side of 
each subsequent arm. Thus, this example has no free names, and 
the value of c is 10. Formally expressing these properties is the job 
of binding specifications, such as those in Nominal Isabelle [17], 
Ott [16], andCaml [12]. 

If we want to programmatically manipulate source code while 
respecting its binding structure, we must add to our system a notion 
of a-equivalence. Only then can the correctness motto of preserva- 
tion of a-equivalence be well-defined. We would like our notion 
of a-equivalence to be as compositional as possible, even though a 
pair of a-equivalent may decompose into pairs of subterms which 
are not a-equivalent. For example, our expression above is a- 
equivalent to the following: 

(let* ((d 1) 

(d (+ d d)) 
(d (* d 5))) 
(display d) ) 

This is despite the fact that (display d) is clearly not a- 
equivalent to (display c). The standard solution is to observe 
that they are a-equivalent after performing a substitution deter- 
mined by examining the binders. This is our solution as well, but 
in order to model forms like let*, we must allow binders to be 
"exported" up from subforms (and sub-subforms, etc.) in a well- 
defined way. We say such binders are "buried." Furthermore, such 
a buried binder may be referred to inside the form that exports it, 
causing it to participate in multiple binding relationships at differ- 
ent levels. 

It is instructive to contrast our work to another system capable 
of handling complex binding structure, the Dybvig algorithm [3] 
used in many Scheme implementations. In this algorithm, names 
are dynamically marked to indicate which macro evaluation they 
originated from, in order to prevent errors caused by coincidental 
name collision. We will discuss the limitations of this approach in 
section 7.4. 

An improvement upon this is David Herman's A m -calculus [7], 
a macro system that uses binding specifications to guarantee that a- 
equivalent inputs will expand to a-equivalent outputs. However, its 
macros can be defined only in terms of pattern-matching. Pattern- 
matching systems are a natural way to define simple macros, but 
they lack the power to define more intricate macros, which are often 
the macros that benefit the most from access to complex binding 
constructs. 
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CoreExpr :: 


= RAtom 


(variable reference) 


var 




Prod( CoreExpr, CoreExpr) 


(application) 


app 




Prod(BAtom, CoreExpr\0) 


(abstraction) 


lam 


Expr 


= RAtom 


(variable reference) 


var 




Prod(Expr, Expr) 


(application) 


app 




Prod(BAtom, Expr\.0) 


(abstraction) 


lam 




Prod(LetStarClauses, Expr\,0) 


(sequential let) 


let-star 


LetStarClauses ■■ 


= Prod() 


(no clauses) 


lsc-none 




Prod K1>0) (Prod 40 (BAtom, Expr), LetStarClauses 10) 


(clause, and more clauses) 


lsc-none 



Figure 1. Example types for two lambda calculi, one of which has the let* form. (The names from the right-hand column are used to 
identify injections and as the variables for cases in Figure 2.) 



1 (define-fn (convert e-Expr) : CoreExpr 



2 (case e 

3 (var => (inj var var)) 

4 (app => (open app (el,e2) (inj app (prod convert(el) , convert(e2))))) 

5 (lam =>■ (open lam (bv,e-body) (inj llim (prod bv,convert(e-body) 10)))) 

6 (let-star => 

7 (open let-star (Isc, e-body) 

8 (case Isc 

9 (lsc-none =>■ convert(e-body)) 

10 (Isc- some =>■ 

11 (open Isc-some (bv,val-expr,lsc-rest) 

12 (let e-rest be convert( (inj let _ star (prod Isc-rest, e-body [0) ) ) 

13 in (inj app (prod (inj lam (prod bv, e-restlO)) , convert(val-expr) )))))))))) 



Figure 2. A Romeo-L function to expand away let* 



Our system is more flexible and can even operate outside the 
context of macro expansion altogether: we provide a full-fledged 
programming language for manipulating terms. 

1.2 Example 

As an example, we define in Figure 1 some types in our system, and 
in Figure 2 we define a function (using those types) that translates 
expressions from the lambda calculus augmented with a let* con- 
struct into the plain lambda calculus. We will discuss the meaning 
of the types in section 2.1, the behavior of the code in section 4.3, 
the way Romeo preserves a-equivalence for it in section 5.1, and 
the static guarantees that Romeo provides in section 6.3. 

1.3 Contributions 

Our primary contribution is an extension of David Herman's system 
for binding-safety in a pattern-matching macro system [7] to cover 
macros defined by procedures, and thus general meta-programming 
for terms with bindings. Our language is inspired by Pure FreshML 
[13]. 

Our system has the following features: 

• Values in Romeo are "plain old data": atoms arranged in ab- 
stract syntax trees without binding information. Types provide 
the missing binding information. 

• Romeo has an execution semantics which ensures that instead 
of a name "escaping" the context in which it is defined, a FAULT 
is produced. 

• We prove a theorem guaranteeing that, in any execution, the 
dynamic environment can be replaced by one with a-equivalent 
values, and that execution will proceed to a value a-equivalent 
to what it otherwise would have. 



• We provide a deduction system with which the programmer can 
establish that escape (and thus, FAULT) will never occur. 

2. Binding language 

2.1 Overview of binding types 

Values in our system are plain old data, that is, S-expressions 
or something similar. We use binding types to specify the bind- 
ing properties of these terms. Binding types augment a traditional 
context-free grammar with a single attribute (in the style of an at- 
tribute grammar) that represents the flow of bindings from one sub- 
term to another. 

In Figure 1 , the type definition of CoreExpr looks like a tradi- 
tional grammar for the lambda calculus, with one major difference: 
the notation CoreExpr[0 indicates that the binder "exported" by 
the child in position 0 (the BAtom, which exports the name in that 
position) is to be in scope in the CoreExpr body of the lambda. 
To facilitate the connecting of binders to references, the type for 
names that bind, BAtom, is made distinct from the type of names 
that reference binders, RAtom. 

Our system observes the convention that all names bound in a 
particular value are bound in all subvalues, unless overridden by a 
new binding for the same name. It is possible to imagine a system in 
which old names are removable (e.g., a construct (unbind x e) , in 
which the name x is not a valid reference in e, even if it was outside 
that construct), but this does not appear to be a feature that users 
are clamoring for. (But see the end-of-scope operator described by 
Hendriks and van Oostrom [6].) 

A way to define the exports of a wide product is required when 
the binders are exported longer distances up the tree. Consider 
let*, in Expr. The "sequential let" line indicates that the binders 
exported by LetStarClauses are in scope in the body of the let* ex- 
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Expr ::=... {same as before) 

| Prod(BAtom, BAtom, Expr[Q ta 1, BAtom, ExprlO ta 3) {event handler) 



Figure 3. Typed production rule for event handler 



pression. The grammar for LetStarClauses says that a set of let*- 
clauses is either empty or a pair consisting of a single clause and 
the rest of the clauses. In the second production for LetStarClauses, 
the Prod" 0 (BAtom, Expr) indicates that the first clause exports the 
binder from its position 0 (that is, the BAtom). However, this name 
is not in scope in the Expr. The LetStarClauses\,0 indicates that the 
names exported from the first clause are in scope in the remainder 
of the clauses. 

The ff (1 > 0) indicates that the entire set of clauses exports 
all the binders exported by either the first clause or the rest of the 
clauses, and that names in the rest of the clauses override those from 
the first clause. If we had wanted to specify that all the binders in a 
let* must be distinct, then we could have written ff (1 ta 0), which 
behaves like ff (1 t> 0), except that duplicated atoms are an error. 

Thus we have an attribute grammar with a single attribute, 
whose values are sets of names representing bindings. These sets 
are synthesized from binders and inherited by other terms until they 
flow to references. 

Our notations J., ff, >, and ta form an algebra of attributes; the 
tractability of this algebra is a key to many of our results. We call 
the terms in this language binding combinators. 

2.1.1 Example: multiple, partially-shared bindings 

For another example, imagine constructing a pair of event handlers, 
one of which handles mouse events and one of which handles 
keyboard events, but both of which need to know what GUI element 
is focused. This new form, defined in Figure 3, binds three atoms 
(the BAtoms, which are in positions 0, 1, and 3), one of which is 
bound in both subexpressions, and two of which are bound in only 
one of them. Here is a possible use of this new form: 

(handler gui-elt 

mouse-evt (deal-with gui-elt mouse-evt) 
kbd-evt (tag gui-elt (text-of kbd-evt))) 

And here is an a-equivalent, but harder-to-read, version: 

(handler a 

b (deal-with a b) 

b (tag a (text-of b) ) ) 

The scope of the first b is the (deal-with . . . ), and the scope of 
the second one is the (tag . . . ) . 

Regardless of whether they have the same names, the meanings 
of the two events must not be conflated, but the GUI element must 
not be differentiated. For this reason, the operations our system 
performs on products must handle binding by first identifying what 
names are exported by each child (e.g. a BAtom or a Prod with a 
non-empty ff), and then determining which names are imported by 
which children. The latter is the responsibility of the [ operator. 

Our goal of supporting realistic concrete syntax is particularly 
relevant here. The handler statement could be implemented as a 
function that gets called with a lambda (binding gui-elt) that re- 
turns a pair of lambdas (binding the -evts), at the cost of some 
inconvenience for the programmer. If the only binding construct in 
a language were lambda, an "off-the-shelf" nominal logic system 
would suffice as a basis for Romeo. However, programmer conve- 
nience is precisely the point of metaprogramming systems. 



2.2 Binding types, in more detail 

In this section, we introduce our actual language of binding types 
and the metalanguage we use to describe them. 

t £ Type ::= BAtom 
RAtom 

■ a 

injO(u) 
injl(w) 

prodi(vi) 



a 6 Atom 
v € Value : 



Prodf (r4A) 



flX.T 

X 

Values are either atoms, left- or right- injections of values (to 
model sum types), or tuples of values. We write prod^u;) for the 
tuple {vo, . . . , v n ), for some n. We will use notation like this for 
sequence comprehensions throughout our presentation. 

The basic types are BAtom (for binders) and RAtom (for refer- 
ences). These types tell us how to interpret atoms. By convention, 
BAtoms export themselves and RAtoms export nothing. 

Tuples are interpreted by Prod types. The wide product type 
Prod"' 9 " (to I /3o, • • • , T n ip n ), which we denote by the comprehen- 
sion ProdJ' 9 " {Tii/3i), tells us how to interpret the value prod^Wi). 
The term /?;, constructed in our algebra of attributes, combines 
(some of) the binders exported by vo , ■ ■ ■ , v„ to determine the lo- 
cal names bound in w». Again, by convention, these names over- 
ride those inherited from outside ("above") prod^Wi). The bind- 
ing combinator /3 ex , similarly constructed in our algebra of at- 
tributes, combines the binders exported by vo , . . . , v n to determine 
the names exported as binders by the tuple prod i («j). 

To sidestep issues of parsing, we have sum types and injections. 
A value injO(u) (resp. injl(u)) is interpreted by the type To + tl so 
that ti is interpreted by to (resp. ti). 

Last, we have recursive types /iX.t, where t must be produc- 
tive; to interpret a value v according to fiX.r is to interpret it ac- 
cording to t [fiX.r/X]. 

2.3 The algebra of binding combinators 

Binding combinators are terms built from the following grammar: 

i,£ € N 
/3 € Beta ::= 0 

/3b/3 

£ 

As discussed above, we use binding combinators to collect 
names from the sets exported by the subterms of a sequence 
prod i (uj). We will need to interpret these combinators over both 
sets of names and substitutions (finite maps from names to names). 
As before, we make liberal use of comprehensions: we write 
J/3] {A i ) i for (A 0 , A n ), etc. The interpretation is as fol- 
lows: 



[_] (_) : Beta x AtomSet -*■ AtomSet 

M (Ai), = At 
[/3>/3'] {A i ) i (A i ) i u J/3'] {A i ) i 
I/3B/3'] (Ai^iffl (A^fflI/3'J {A i ) i 

Here and elsewhere, we write X to mean a sequence of Xs. 

A substitution a is a partial function from atoms to atoms. For 
the purposes of manipulating them, we represent substitutions as 
a set of ordered pairs of atoms. Our substitutions are naive, which 
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is to say that they ignore binding structure and simply affect all 
names. We interpret P's on substitutions as follows: 



• Subst 



[_] (_) : Beta x Subst 
[0] (a i ) i = 0 
M (ffi)i = at 
W>P\ (°i)i±W (4>lfl (<*)i 

In this definition, ta is disjoint union, undefined if the sets are not 
disjoint, and a > a is defined as follows: 

' . \ i \ («d,o r ) € <t' 1 
*>a = au{(a d ,a r ) 

Using |/3] (j4i)i' we can compute the binders exported from 
any value. We call these the free binders of the value. As suggested 
above, the free binders of a value are determined using the type. 

ftfProdf^raftXprod^)) :: = I/M (fb(-r i ,« i )) < 
fb(r 0 +Ti,injO(w)) ::=fb(r 0 ,«) 
fb(r„ +Ti,injl(w)) ::= fb(ri,u) 

fb(//X.T,v) ::=fb(T[(J,X.T/X],v) 
fb(BAtom, a) ::= {a} 
fb(RAtom, a) ::= 0 

There are several other useful quantities that we can compute 
using these combinators. First is the set of free references of a term: 

fr(Prodf-(r4ft),prod i( ,0)::=u(^:f ( i (rj , %)) J 

fr(r 0 + Ti,injO(v)) ::= fr(T 0 ,v) 
fr(ro + ri,injl(w)) ::= fr(Ti,i>) 

ft{flX.T, v)::= fr(T[fJ,X.T/X] , v) 

fr(BAtom, a) ::= 0 

fr(RAtom, a) ::= {a} 

Next is the set of free atoms of a term, which is just the union of 
the free binders and the free references: 

fa(r, v) ::= fr(r, v) U fb(r, v) 

Last is the set of exposable atoms, which are those non-free 
names that will become free when the value in question is broken 
into subterms. These are the atoms which are on their "last chance" 
for renaming before they become free. This set, only defined on 
products, is equal to the union of the binders exported by each term 
in a sequence, less the terms that are exported to the outside: 

xafProdf^TUAXprod^)) 

::= (yfKTi.Wi)) n fbfProdf^Cr^AO^rod^^)) 

It is also useful to know the support of a binding combinator /3: 

_ £ _ c N x Beta 

£i0 = false 

£ e £' = I = £ ' 
£ e 0 > fi' ± £ i 0 or £ e p' 
£ i P ta P' ± I i P or £ e p' 

3. Alpha-equivalence 

Our next task is to go from a binding type to a notion of in- 
equivalence on values described by that type. Because our binding 
types allow for buried binders (i.e., binders that may be an arbitrary 
depth from the form that binds them) to be exported, we define two 
values to be a-equivalent if both 

• they export identical bindings, and 

• local (non-exported) bindings can be renamed along with the 
names that reference them to make the terms identical. 



We use =b (pronounced "binder-equivalent") for the first relation 
and =r (pronounced "reference-equivalent") for the second. 

_ = a _ : _ c Value x Value x Type 



V =B V : T 



V=RV 



aEQ 



3.1 Binder equivalence 

Two values are =b iff their exported (free) binders in the same 
positions are identical (references are irrelevant). Note that Ba- 
PROD examines only the subterms that are in the support of /3 ex , 
because non-exported binders are the responsibility of =r. 

Here and throughout, we omit the rules for injections and fixed 
points, which are trivial. 

_ =b _ : _ £ Value x Value x Type 



a =b a : BAtom 



Bq-BAtom 



a =b a ■■ RAtom 



Ba-RATOM 



Vi i p eK . Vi =b v'i ■■ n 



Ba-PROD 



prod i (v<) = B prod^) : Prodf "(r4ft) 
3.2 Reference equivalence 

Calculating =r is analogous to the conventional notion of in- 
equivalence, except that we need to extract and rename the bindings 
that are buried in subterms. 

3.2.1 Joining the binders 

We begin with the n operator (pronounced "join"). It walks through 
both values in lockstep, collecting pairs of corresponding binding 
atoms and assigning a common fresh atom for each. The result is a 
pair of injective substitutions whose domains are equal to the set of 
free binders of the values being joined. 

At a product, we do the following on each side: we first walk 
through each subterm, recursively generating substitutions for each 
binding exported by any of the subterms, making sure that there 
is no overlap between the fresh names (i.e. the ranges of the sub- 
stitutions) assigned in different subterms. The substitutions for the 
subterms are then combined by p ex to produce a substitution for the 
exported binders of the product term. These two substitutions (the 
last two terms of the n relation) are the output of this relation. 

We define # to be the disjointness operator over names, sets 
of names, and values. It is naive, meaning that it entirely ignores 
binding structure. Therefore, a # Xb.b is true, but a # Xa.a is 
false. 

_m_:_->_m_c Value x Value x Type x Subst x Subst 

J-BAtom 



a n a : BAtom -> {(a,a fres h)} * {(a', af reS h)} 

} J-RAtom 

a n a : RAtom -+0«0 



ffiMff- Vi* j. rng(u l ) # mgfa) 
prod^Wj) n prod^w-) : Prodf' 3 ™ (niPi) -> a n a 



Vi. Vi N Vi '• T - 

<r=\ 



J-Prod 



For example, consider the two let* expressions we have previ- 
ously discussed: 

(let* ((a 1) (let* ((d 1) 

(b (+ a a)) (d (+ d d)) 

(c (* b 5))) (d (* d 5))) 

(display c)) (display d) ) 
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The results of n on their children in position 1 (the display 
expressions) are 0 and 0, because neither one has any free 
binders. Position 0 corresponds to the LetStarClauses, and is 
more interesting, n will nondeterministically generate three names, 
which we will choose to be aa, bb, and cc. Then, we will have 
<to = {{a, aa) , (b,bb) , (c, cc}} and a' Q = {{d, cc)}. The different 
ranges of these substitutions indicate that some names (the ones 
called a and b in the left-hand value) cannot be referred to at all by 
references on the right-hand side, due to shadowing. 

A more complete derivation of this is shown in Figure 4. The 
first step is straightforward: c and d are unified with each other, and 
the resulting substitutions are merged with the empty substitution. 
In the next step, b and d are unified, but the substitution in position 
1 (which produced cc) takes precedence over the new definition of 
d. Finally, a similar process completes generating the substitutions 
corresponding to the free binders in each LetStarClauses. 

3.2.2 Comparison by substitution 

Now we can write the rules for =r. At RAtom, the atoms being 
compared are necessarily free and must be identical in order to 
be reference-equal. Symmetrically to =b, any two atoms are =r at 
BAtom. 

At a product, the information from n is used by =r to make 
the subterms comparable without requiring context. This is done as 
follows: For each pair of subterms v\, we use n to generate a 
pair of substitutions Oi, a[ that rename the binders exported by Vi, 
v'i to be identical. We then apply these renamings to the subterms, 
as directed by /3i. 

Applying these substitutions to each pair of subterms (resulting 
in the new values (o~j)j(vi) and (o~'j) (v'i)) allows us 
to examine each pair of children in isolation. Note that this substi- 
tution is naive (that is, it disregards types and therefore binding). 
Even though we are only interested in the substitution's effect on 
free references, this naivete is acceptable because, first, = R does not 
examine free binders, and second, (broadly speaking) the substitu- 
tion of un-free names is harmless (this principle is illustrated by 
Lemma 3.2). 

Because our substitutions are naive, we require that each sub- 
stitution's range be disjoint from the values being examined. With- 
out this requirement, we would have (let* ((x 7)) x) = R 
(let* ((y 7)) a) : Expr, witnessed by a 0 = {(x, a)} and 
*o = {<y,a>}. 

_ =r _ : _ £ Value x Value x Type 

7 Ra-BATOM Ra-RATOM 

a =r a : BAtom a =r a : RAtom 

Vi. Vi n v'i : Ti ->■ <Ji n a'i 
Vi, j. rng((j l ) ,rng(<7-) # Vj,v'j 
Vi* j. rng(<Ti) #rng(a i ) 

vi.[&] te»i)=R mi {°'i)M)--Ti 

— —f- — — Ra-PROD 

prod^Wi) = R prod^t;,) : Prod Ti/3 (/? ex ) 

In our ongoing let* example, the appropriate substitution is a 
no-op on the LetStarClauses, which import nothing. Recursive 
application of =r will discover their shared binding structure and 
compare them as equal. On the other hand, the expression bodies 
will be both transformed into (display cc) , which lacks binding 
structure, and is naively equal to itself. So, the two expressions 
are =r. (They are also trivially =b, and therefore = a .) A complete 
derivation of their reference-equivalence, including analyzing the 
LetStarClauses themselves, is available at http : / /hdl . handle . 
net/2047/d20005012. 

An example that better demonstrates the complexities of renam- 
ing is the event handler example from section 2.1.1. The result of 



invoking n on each pair of children, in order to compare the two 
versions for a-equivalence, is in Figure 5. 

All of the /3's, except ($2 and /?4, are 0. /?2 is 0 ta 1 and /?4 
is 0 ta 3. The result of performing those substitutions is shown in 
Figure 6, establishing the relationship between mouse-evt and the 
first b and the relationship between kbd-evt and the second b. 

3.3 Lemmas 

The following lemmas establish that = a has behavior consistent 
with an a-equivalence. Each follows from analogous lemmas about 
= B and = R (where fb and fr replace fa, if applicable). 

Lemma 3.1. = a is an equivalence relation. 

Lemma 3.2 (Unfree atoms can be renamed). Suppose a is injective 
andmg(a) # v and dom(cr) # fa(r, v) , rng(cr). Then a(v) = a 

V : T. 

Lemma 3.3 (Good substitutions preserve a-equivalence). Suppose 
a is injective and rng(cr) # v,v' and dom(cr) # rng(cr). Then 

V = a v' '. T => (j(v) = a o(v') : T. 

Lemma 3.4 (Free atoms are the same for a-equivalent values). 

v = a v ■ t => fa(r, v) = fa(r, v') 

4. Romeo 

Romeo is a first-order, typed, side-effect-free language whose val- 
ues are abstract syntax trees. It uses types to direct the interpretation 
of these trees as syntax trees with binding, and to direct the execu- 
tion of expressions in a way that respects that binding structure. 
There are three parts to this: 

• First, the execution semantics ensures that whenever the pro- 
gram causes a name to escape the context in which it is defined, 
a FAULT is produced. 

• Second, we provide theorems guaranteeing that at any point 
in execution, the dynamic environment could be replaced by 
one with a-equivalent values, and execution would still pro- 
ceed to a value a-equivalent to what it otherwise would have. 
Furthermore, execution is deterministic up to a: that is, the non- 
deterministic choices that are made (e.g. for fresh identifiers) do 
not change the a-equivalence class of the result. 

• Last, we provide a deduction system to generate proof obliga- 
tions which, if satisfied, guarantee that escape (and thus, FAULT) 
will never occur. 

The syntax of Romeo is given as follows: 

p 6 Prog ..=fD ... e : r 
fD € FnDef ::= (define-fti (/ x : t ... pre C) ■ r e post C) 
e 6 Expr ::= (fx...) 

(fresh x in e) 

(let x where C be e in e) 

(case x (x e) (x e)) 

(open x (x ...) e) 

(if x equals x e e) 

e qlit 

e qllt 6 QuasiLit ::= x 

| (refcc) 

I (injoe qll, r) 

I (injire" 1 ") 

| (prodf ef ifc) 

Here C ranges over a language of invariants from which the 
proof obligations for static safety are constructed. Romeo's op- 
erational semantics does not refer to these invariants. This sub- 
language is discussed in Section 6. 
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voxi - ((c (* b 5))) andwo,!,! = ((d (* d 5))) 
Define: v 0 ,i = ((b (+ a a)) (c (* b 5))) and v' 0<1 = ((d (+ d d)) (d (* d 5))) 

v 0 = ( (a 1) (b (+ a a) ) (c (* b 5) ) ) and v' 0 = ( (d 1) (d (+ d d) ) (d (* d 5) ) ) 



(3.1) 


c n d : BAtom -> {(c, cc)} n {(d, cc)} 




J-BATOM 


(3.2) 


(* b 5) n (* d 5) : Expr -+0«0 




no free bindings 


(3.3) 


(c (* b 5)) n (d (* d 5)) : Prod" 0 (BAtom, Expr) -> {(c, cc)} n 


{(d,cc)} 


J-PROD, 3.1 and 3.2 


(3.4) 


vo.i.i N v 'o,i,i '■ LetStarClauses -* {(c, cc}} n {(d, cc}} 




J-PROD, 3.3 


(3.5) 


b n d: BAtom {(b,bb}} n {(d,bb)} 




J-BATOM 


(3.6) 


(+ a a) n (+ d d) : Expr -+0«0 




no free bindings 


(3.7) 


(b (+ a a)) n (d (+ d d) ) : Prod" 0 (BAtom, Expr) -^{(b,bb)}M 


{<d,bb>} 


J-PROD, 3.5 and 3.6 


(3.8) 


11 >0]({(d,cc)},{<d,bb}}) = {(d,cc)} 




def.ofO 


(3.9) 


«o,i N %i : LetStarClauses -* {(b,bb) , (c, cc}} n {(d, cc)} 




J-PROD, 3.4, 3.7, and 3.8 


(3.10) 


a n d : BAtom -> {(a, aa)} n {(d, aa)} 




J-BATOM 


(3.11) 


1 n 1 : Expr -*■ 0 n 0 




no free bindings 


(3.12) 


(a 1) n (d 1) : Prod" 0 (BAtom, Expr) -+ {(a, aa)} n {(d,aa)} 




J-PROD, 3.10 and 3.11 


(3.13) 


11 >0] ({(d,cc)},{(d,aa)}) = {<d,cc)} 




by def . of | 


(3.14) 


«o N v' 0 : LetStarClauses -*■ {(a, aa) , (b,bb) , (c, cc)} n {(d, cc)} 




J-PROD, 3.12, 3.9, and 3.13 



Figure 4. Example derivation of n for LetStarClauses 



3 














(Tj Cj 


0 




gui-elt 


n a 




: BAtom 




{(gui-elt, gg)} n {(a,gg)} 


1 




mouse-evt 


n b 




: BAtom 




{(mouse-evt, mm)} n {(b,mm)} 


2 


(deal-with 


gui-elt mouse-evt) 


n (deal-with a 


b) 


: Expr 




0 N 0 


3 




kbd-evt 


n b 




: BAtom 




{(kbd-evt, kk)} n {(b,kk)} 


4 


(tag gui-elt 


(text-of kbd-evt)) 


n (tag a (text- 


-of b)) 


: Expr 




0 N 0 



Figure 5. Substitutions generated for the handler example 



[0 ta 1] (crj)j ( (deal-with gui-elt mouse-evt)) = (deal-with gg mm) 

[Ob 1] (oj) .((deal-with a b)) = (deal-with gg mm) 

[Ota 3] (<7j)j((apply-tag gui-elt (text-of kbd-evt) ))= (apply-tag gg (text-of kk)) 

[Ota 3] (<7j-) ((apply-tag a (text-of b))) = (apply-tag gg (text-of kk)) 



Figure 6. Result of substitution in the handler example 

The form of the execution judgment is: 

k 

T He« (e,p) => W 

The k argument indicates the number of execution steps taken to 
produce the result in question. 

Observe that some execution rules depend on the type environ- 
ment r. This is because the binding structures of values are rep- 
resented in their types (r), but not in their runtime representations 
(v). Therefore, type erasure is not possible — the meaning of values 
(and thus the behavior of those rules) depends on type information. 

We can now give the rules for execution in Romeo. Rules that 
introduce names come in two forms, -Ok, and -ESCAPE. In each 
case, the only difference is that FAULT occurs in the -ESCAPE 
case. A fault indicates that a name has escaped the scope that 
created it (E-FRESH-*) or exposed it (E-Open-*). Much of the 
rest of the machinery in those rules is about ensuring that newly 
introduced names do not collide with each other or with names in 
the environment. 



Typechecking is largely straightforward. In the body of open, 
the variables x . . . are given the types of the subterms of the scru- 
tinee x, and in the body of fresh, x is bound to a new name at the 
type BAtom. In order to use that name as a reference, the ref form 
takes an argument of type BAtom and returns it as a RAtom. 

We annotate injections with the types of the arm-not-taken, and 
product constructors with their binding structure. This allows us to 
write a function typeof (T, e) whose definition is routine. 

4.1 Operational Semantics 

We define Romeo's execution in big-step style. 

We begin with auxiliary definitions that we will need: 

w e Result ■■■■= v | FAULT 
p s ValEnv ::= e 

I p[z^v] 

T s TypeEnv ::= e 

I r, 2 :r 

fae„ v (r,p) = U fa(V(x),p(x)) 

xedom(r) 
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4.2 Execution rules 

We begin with the rules for evaluating fresh expressions. The 
rules require that the new name not occur in the environment p. 
Our determinacy theorems (Theorems 5.1 and 5.2) guarantee that 
the choice of the new name will not affect the result (up to in- 
equivalence). We have two versions of the rule: FRESH-ESCAPE 
which returns FAULT when the new name appears in the result of 
executing the body e, and Fresh-Ok, which returns w when that 
is not the case. 

During the execution of e, x is treated as a BAtom. It is con- 
vertable to a RAtom by the expression (ref x). 

The hypothesis r = typeof ((T, :r:BAtom) , e) is needed to 
synthesize the type r in order to determine the free atoms of the 
result, in order to determine whether to produce FAULT or not. 
Determining the type, of course, is entirely static and could be pre- 
computed once rather than at each evaluation. 

r = typeof ((F, a;:BAtom) , e) 



a i fa env (r,p) 
T, cc:BAtom l- eX e (e, p [x -> 



E-Fresh-Ok 



E-Fresh-Escape 



a])^ 
w = FAULT v a £ fa(r, to) 
fc+i 

T Hate ( (fresh x in e) , p) => w 

t = typeof ((T, x-:BAtom) , e) 
a i fa env (r,p) 

k 

r,a;:BAtom i- eX e (e, p [x -* a]} => w 
w * FAULT A a € fa(r, w) 

k + 1 

T l-exe { (fresh x in e) , p) => FAULT 

The next pair of rules deals with destructuring a product. Given a 
value p{x) = prod(u 0 bj,o, ■ • • , v 0 bj, n ), the open expression chooses 
an a-variant prod(«o, . . . , v n ) and binds the resulting pieces to the 
variables Xi. In order to determine a-equivalence, the type r is 
needed. In this way, types control the run-time behavior of Romeo 
programs. The names in the a-variant must be distinct both from 
names in the environment p and from each other (to the extent that 
they are not actually related by binding). This is taken care of by a 
subsidiary judgment H suff . dis j. See section 4.2.2 for a more detailed 
discussion. 

As with fresh, we have two rules which branch on whether 
any of the new names appear in the result of the body e. We test 
for escaped names by comparing the free atoms in the result with 
the exportable atoms of the renamed input. This suffices for safety 
because the only atoms that can become free are those that are 
exportable. 

p(»obj) =a prod^Ui) : T obj r 1-type :r 0 bj : T obj 
r obj = Prod*" "(Tiift) t = typeof (T, e) 
fa cnv (r,p) i-suff-disj prod^Vj^Tobj 

T, (x i :T i ) i l-exe (e,p[xi -* Vi] x ) => W 

w = FAULT v xa(r ob j,prod i («i)) # fa(r,w) 
r l-exe ((open Xobj ((xi)i) e),p)^Xw 



E-Open-Ok 



p(xobj) =a prod^Uj) : T, 



r obl = Prodf cxp (rUft) t = typeof (T, e) 
fae„ v (r,p) i- S u£f-di S j prod^Ui^Tobj 

T, {x i :T i ) i l-exe (e, p [Xi -+ Vi]^ => W 

w * FAULT A -, (xa(r obj , prod; («;))# fa(r, w)) 



obj 



type 3?obj ■ 7"obj 



r i-exe ((open x obl ((xi) t ) e) , p) fault 

To simplify the deduction system, we require variables in some 
places where expressions would be more natural (like function ar- 



E-Open-Escape 



guments or x 0 hj in open). As a result, programs are written in 
(roughly) A-normal form, naming intermediate results with let. 
There are two evaluation rules for let, depending on whether cal- 
culating e va i faults. As noted avove, the constraint C is ignored at 
run-time. 

A '"exe (e va i,p) =^> typeof (r,e va i) 



E-Let 



T, X:r va l H exe (e b0 dy, P [X ~* «valj) > W 

T i-exe { (let x where C be e va i in e bo d y ) , p) > w 



r l-exe (e V al,p) => FAULT 

5 E-Let-Fail 

k+1 

T i-exe ( (let x where C be e va i in e bo d y ) , p) => fault 

As in Pure FreshML [13], we assume that our expressions are eval- 
uated in a context of function definitions, so that from a function 
name we can retrieve the function's formals and body. Since this 
context is constant throughout an execution, it is elided in the eval- 
uation judgment. 

body(/) = e formals(/) = (a;f orma i, i :r fonnaU ) i 

(^formal, informal, i) ^ I— exe (^, [ [^formal,! - * Pi (XaetuaU ) ] i ] ) ==> W 



L 1— exe ((/ (^actual,* )i) iP) ==> w 



E-Call 



The remainder of the rules are routine. For simplicity's sake, the 
equality test construct works only on atoms. 



p(x obj ) = injO(uo) r(x obj ) = T () + n 

k 

T, X 0 :T 0 l-exe (e 0 , p [x 0 -* «o]) => W 

T i-exe ((casezobj (x 0 e 0 ) (xi ei)) ,p) =^=> w 



E-Case-Left 



p{x ohs ) = injl(wi) T(x ob ,) = to + ti 

r,Xi:ri Hexe {ei,p[x! -> Wl]) => W 

r i-exe ( (case a; (x 0 e 0 ) (xi ei)) ,p) ■ 
p(a;i) = a p(x r ) =6 a = 0 

T l-exe (e 0 ,p) W 



E-Case-Right 



E-If-Yes 



r i-exe ( (if x\ equals x T eo a ) , p) => w 



p(xi) = a p(x r ) = b 

k 

r l-exe (ei,p) => 



a*b 



E-IF-No 



T i-exe ((if xi equals x r e 0 ei) , p) w 

The E-PROG rule initiates evaluation of a program. It is notionally 
responsible for setting up the (non-notated) function context. 



e i-exe (e,e) =^> t 
fc+i 

l-exe/D- • • e ==> W 



E-PROG 



4.2.1 Quasi-Literals 

The last language component is a category called quasi-lite rals, so 
called because they look like literal syntax for object-level syntax 
objects, except that they contain variable references (which denote 
values), not literal atoms. Of course, those variables may refer to 
atom values generated by fresh. Quasi-literals also contain some 
type information to avoid the need for type inference. 



T l-exe (e ql ",p) ==> V 



E-QLIT 
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Their evaluation is specified by the following rules: 



[_]_ : QuasiLit x ValEnv -*■ Value 
{x\ p = p{x) 
refx)j p = p(x) 

injO(P"'] P ) 
[(injrre^Ml^injlde^],) 



[(injo e" Ul r 



[(prodf- efWi)\ P = prod.defL) 



4.2.2 Sufficient Disjointness 

The requirement that evaluation be insensitive to a-equivalent in- 
puts leads to strong requirements on the way that open destructures 
values. Consider the let* example above: 



(let* ((a 1) 

(b (+ a a)) 
(c (* b 5))) 
(display c)) 



(let* ((d 1) 

(d (+ d d)) 
(d (* d 5))) 
(display d) 



These are a-equivalent, but if they were destructured without 
renaming, we would have d = d = d, even though a * b * c, 
violating our goal of being indifferent to a-conversion 1 . Therefore, 
we need to freshen each d to a distinct new name, e.g. 

((aa 1) 
(bb (+ aa aa) ) 
(cc (* bb 5))) 

The rule to ensure this is that, before destructuring, we must a- 
convert values such that the resulting renamed binders are disjoint 
from each other and from any names that appear in the environ- 
ment. This gives rise to the hypothesis 

fa cn v(r,p) l-suff-disj prodj(t>i):T ob j 

in the E-OPEN-* rules. The l— suff-disj judgment here checks that 
binders exported by prod i (wj)'s non-exported subterms are dis- 
tinct from each other, and that the exposable atoms are disjoint from 
the free atoms in the environment. 

To check the first part of that, we define the judgment 

I— bndrs-disj V '• T, 

which checks that the exported binders in v (as determined by the 
type r) are disjoint from each other. 

BD-BAtom 



l-bndrs-disj « : BAtOm 



BD-RAtom 

i-bndrs-di S j a : RAtom 

Vi, j £ /3 ex . i * j => fb(ri,Vi) # fb(rj,Vj) 

'"bndrs-disj Vi '• T% 

i-bndr S -disj prod^Uj) : Prodf Ax {niPi) 
We can now define l-suff-disj : 

Vi i /3 e x- l-bndrs-disj Vi ■ Ti 

Vi, j i f3 ex . fb(n,vi) # fb(Tj,Vj) 

xa(prod 1 (w l ),Prodf' 3 "(r4A)) # A 
A hsuff-disj prod^wO^odf e "(r4A) 



BD-PROD 



SUFF-DISJ 



1 In order to compare them, of course, the LetStarClauses would need to 
be destructured further. For simplicity, we consider the "last chance" for 
renaming to be the outermost level to which a name is exported, even if it is 
shadowed at that point. 



4.3 Example 

For an example, we write code that translates between two lan- 
guages: from the lambda calculus augmented with a let* construct 
into the plain lambda calculus. 

Our code, in Figure 2, mentions types defined in Figure 1. 
It is written in Romeo-L [10], which is a friendlier front-end to 
Romeo. For our purposes, the important differences are that the 
arguments to function calls and the scrutinees of open and case 
may be arbitrary expressions (not just variable references), and that 
Romeo-L can infer the strongest possible constraint C for let, so 
we may omit it. Furthermore, it will turn out (see section 6) that we 
need no pre- or post-condition from convert to show the absence of 
FAULT, so those constraints are also omitted. 

Additionally, we have chosen to use more readable n-way sum 
types. This means that our case construct can branch 4 ways de- 
pending on whether the Expr it examines is a variable reference, an 
application, a lambda abstraction, or a let-star statement, and that 
injections take (as a subscript) a description of the choice that they 
are constructing. 

Lines 3-5 are straightforward traversal of the existing Expr 
forms that are already forms in the core language (but, since their 
subterms might not be, they still need to be converted by recursively 
invoking convert). 

Lines 6-11 destructure let* forms and handle the trivial case, 
where the let* does not have any arms. Line 12 recursively con- 
verts the body and all but the first arm of the let*, calling the 
result e-rest. Finally, line 13 constructs a beta-redex in the object 
language to bind the first arm's name to its value expression in e- 
rest. 

5. Romeo respects a-equivalence 

We are now ready to prove our main theorem: that Romeo respects 
a-equivalence. This is stronger than the corresponding result for 
Pure Fresh ML [13], which claims only that fresh names do not 
escape (see section 7.2 for a discussion). 

Romeo is nondeterministic in its choice of names in the E- 
FRESH-* and E-OPEN-* rules. This complicates the definition 
of respecting a-equivalence. We show two results: first, that if 
two a-equivalent environments both terminate, then their results 
are a-equivalent, and second, if one of two a-equivalent environ- 
ments yields a result, then the other one must yield at least one 
a-equivalent result as well. 

For each of these theorems, the vast majority of the complexity 
is contained in the cases for E-FRESH-* and E-OPEN-* cases. 

Since we must account for faulting, we extend the definition of 
a-equivalence to assert that FAULT = a FAULT. 

Complete proofs of all the theorems and lemmas we mention 
in this paper are available at http://hdl.handle.net/2047/ 
d20005013. 

Theorem 5.1 (Determinism up to a-equivalence, termination-in- 
sensitive version). 

If r = typeof (T,e) 

and p = a p' : V 

and T i-exe (e,p) ^> w 

k' 

andT i-exe (e,p) => to' 
then w = a w : r 

Proof. (Sketch) The major problem in the proof is that the two 
executions will potentially generate different fresh names in E- 
FRESH-* and E-OPEN-*. Hence, even if the environments start 
out a-equivalent, they will not stay a-equivalent. For example, a 
fresh statement nondeterministically introduces a new free name 
into the environment. Therefore we must generalize our induction 
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hypothesis to account for the ways in which p and p diverge from 
a-equivalence. 

We account for this divergence by introducing two injective 
substitutions to unify the names introduce by the two executions. 
So our induction hypothesis says that if a o p = a a' o p : Y for a 
pair of injective substitutions a and a', then the results will be a- 
equivalent, modulo the same transformation (i.e. a(w) = a cr'(w') : 

T). 

Consider the case of E-FRESH-*. As we enter the scope of the 
new name it generates, a and a' are extended to map the new names 
to a common fresh name, for the sake of the induction hypothesis. 

When we exit from the scope, the induction hypothesis tells us 
that the results (of the inductive evaluation) are equivalent modulo 
the extended substitutions. The results of this evaluation step are 
either the same as those of the inductive step, or FAULT. We first 
show that the original substitutions suffice to a-equate those two 
values, and then that one side faults if and only if the other side 
does. 

The E-OPEN-* case proceeds with a similar structure. How- 
ever, in this case, we are not generating a single pair of new names, 
but unpacking a pair of values, which potentially contain many 
names. The crucial lemma to handle this states that, given two a- 
equivalent (and l- sl iff-disj) values, breaking them apart into their chil- 
dren results in pairs of values which are pairwise a-equivalent mod- 
ulo a single pair of substitutions. In other words, the subterms of a 
sufficiently disjoint value can all be placed into the same environ- 
ment without losing any binding information (which would happen 
if there were any name collisions). After the induction hypothesis, 
E-OPEN-* proceeds like E-FRESH-*. 

The E-IF-* case, though simple, is crucial, because it shows 
that our induction hypothesis is strong enough to guarantee that a 
comparison between two names in p will always have the same 
result as a comparison between two names in p . In particular, the 
injectivity of the substitutions that make p and p a-equivalent is 
necessary. □ 

This theorem leaves open the possibility that some a-variant 
of p might result in an environment p that cannot yield a result. 
The following theorem says that if one a-variant terminates (either 
with a value or FAULT), then every a-variant can terminate (and, 
by Theorem 5.1, when it does, the value will be a-equivalent to the 
result of the original). 2 

Theorem 5.2 (a-equivalent environments have equivalent termina- 
tion behavior). 

If r = typeof (r,e) 

and p = a p' : T 

k 

andT K ex(; (e,p) => w 

k 

then 3w' . V i- Me (e, p) => to' and w = a w' : r 

Proof. (Sketch) For every choice of fresh name in the original 
computation, choose the same name in the other one. This preserves 
a-equivalence of p and p' for the induction hypothesis. □ 

5.1 Example 

Consider again the example in Figure 2. Suppose that we had 
implemented a normal let construct (where the arms do not bind 
names from previous arms), with the type: 

LetClauses ::=Prod() 

| Prod" 1 >0 (Prod" 0 (BAtom, Expr) , LetClauses) 

2 Since we are using big-step semantics, we cannot talk directly about non- 
termination. 



The only difference, besides the name, is that the recursive Let- 
Clauses does not have a J.O. If we had wanted to change the code 
in Figure 2 to expand ordinary lets instead, the above change 
to the type of Expr is sufficient, and the otherwise identical code 
would respect LetClause's binding behavior! This is a consequence 
of Theorem 5.1, which ensures that programs cannot observe any- 
thing about names except their binding structure, as defined by their 
binding specifications. 

6. Checking Binding Safety Statically 

This section describes the Romeo deduction system. The purpose 
of this deduction system is to generate constraints (proof obli- 
gations) which, if satisfied, guarantee that escape, and therefore 
FAULT, will never occur (see Theorem 6.1). 

The proof system's judgment is of the form V H proof {H} e {P}. 
Like the execution semantics, it is type-dependent, and for a similar 
reason: types control which atoms will be bound or free, and thus 
whether operations are valid or not. 

We use H, P, and C to range over constraints. Typically, H, the 
hypothesis, contains facts (about the atoms in the environment) that 
are true by construction. P, the postcondition, contains predicates 
that describe the connection between atoms in the environment and 
atoms in the output. C is used for general constraints, and for the 
constraints in let statements. 

The obligations emitted by the deduction system must be satis- 
fied by showing them true for all p compatible with T; in practice, 
we do this with an SMT Solver (for example, Romeo-L [10] uses 
Z3 [9]). 

We begin by giving the syntax of constraints. 
z 6 ConstrSetVar ::= x 
I • 

s s SetDesc ::= 0 

| s u s 
I s n s 
I sf(z) 

I ^(r) 

sf 6 SetFn ::= T \ T r \ T b \ X 
H,P,C 6 Constraint ::= C A C 
s = s 
s + s 
s#s 
s c s 

Z — val ^ 

true 

Formulas are constructed from variables z, which range over 
program variables, and •, which refers to the output value of the 
current expression. Set-valued terms are constructed from the free 
names (T), free references (J>), free binders (Tb), and exposable 
names (X) of values, the free names of environments (T e ), and then 
by the standard set constructors. Atomic formulas denote equality, 
inequality, etc., of sets, plus equality of values. Last, constraints are 
conjunctions of atomic formulas. 

We use quasi-literals to describe values with variable interpo- 
lation. Because the type environment V is present, the type anno- 
tations of quasi-literals are redundant, but for economy of abstrac- 
tion, we elected to reuse an existing concept instead of creating a 
new one. 

In general, our rules are patterned after those in Pottier [13], us- 
ing the type information in V to collect information about values. 
This subsumes Pottier's A. Most rules discharge their proof obli- 
gations by delegating them to proof obligations on subexpressions. 
The base cases of this recursion are P-CALL and P-QLlT, which 
describe proof obligations of the form T t= H => P. P-QLlT has 
only one obligation, which is to ensure that the result it produces 
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P-Call 



typeof (r, e ql ") = r T, -:r |= H A (• = vaJ e qlit ) => P 

^ - r ^ P-QLIT 

r K proof {H} e qht {P} 

rettype(/) = r formals(/) = (x f ormai,;)i argtype(/) = (r forma i,i)^ 

r 1= H => pre(/) [Xartuau/Zformal.i], T, -IT 1= H A ^(-) C .F e ( (Xactaal.i^formal,; )j ) A P OSt (/) [^actual,* /Zformal.i] ; => P 

rh F ,„[ {if} (/ (Xactual.Oj {P} 

x fresh for T, H, P V, x:BAtom l- proof {P A ^(x) # ^e(r)} e {P A T(x) # ^(-)} 

p Presh 

r i- pr „of {P} (fresh x in e) {P} 
Wi.Xi is fresh for T,H,P 

r(x obj ) = Prodf ft (r4ft) T, (x i :T i ) i K proof {ff a *(x obj ) # .F e (r) A x obj = val (prodf e xdPi)} e {P A X(x obj ) # ^(-)} 

r i- p ro of {if} (openx 0 bj ((a;0i) e ) {P} 



P-Open 



x fresh for F, P, P, C 

r I- p ro of {if} e va i {C} typeof (r, e va i) = r va i T, x:r va i I- proof {H AC [x/-] A ^(x) c ^ c (r)} e b od y {P} 



P-LET 



r i-proof {H} (let x where C be e va i in e bo d y ) {P} 

T(x) = r () + n T, x 0 :r 0 H pro0 f {Pax = va i (inj 0 x 0 n)} e 0 {P} F, xy-n i- pr oof {-Wax = va i (inji t 0 xi)} ei {P} 

T i-proof {P} (case x (x 0 e 0 ) (xi ei)) {P} 

rKp ro of {Pa F(x 0 ) =Hxi)} e 0 {P} r i- proof {P a F(x 0 ) # ^"(si)} ei {P} 



P-Case 



P-IfEq 

r i-proof {P} (if x 0 equals x x e 0 e x ) {P} 
(x i :T i ) i i-proof {Co} e {Ci} Vi. i-^ ok e r- proof {true} e {true} 

p r r P-FNDEF ^ ^ i- P-PROG 

t- proof (deflne-fn (/ (x i :n) i pre C 0 ) : t 0 e post Ci) ok r- pr o 0 f (/DJ 4 e ok 

Figure 7. Verification rules for the deduction system 



obeys whatever constraints were imposed in P, given that the en- 
vironment satisfies the assumptions in P. P-CALL has two obliga- 
tions; first, that the invoked function's precondition is true (given 
P), and second that the resulting value satisfies the constraints in 
P (given P and the postcondition of the function). 

As one might expect, the key rules are P-FRESH and P-OPEN, 
whose definitions are closely connected to E-Fresh-Ok and E- 
Open-Ok. 

Proving the theorems in Section 5 required our language to 
have two important properties: that (a) no name can escape the 
context that exposed it, except as a bound name, and (b) no name 
is exposed twice from two different binding relationships at the 
same time (thereby revealing equality of two names that are not 
related by binding). For the purposes of dynamically respecting in- 
equivalence, property (a) was enforced by detecting such a situation 
and emitting FAULT instead, and property (b) was established by 
the constraints imposed on the names exposed in E-FRESH-* and 
E-OPEN-*. 

Now, for the purposes of the deduction system, property (a) 
appears in the postcondition of both P-FRESH and P-OPEN, as an 
obligation to prove that the exposed free names are disjoint from 
the free names of the result value (spelled '•'), because the purpose 
of the deduction system is to prevent FAULTS. On the other hand, 
property (b) is a guarantee provided by the language dynamics, and 
therefore appears in the hypothesis of both rules, saying that the 
exposed names are guaranteed to be disjoint from the environment 
so far. 

The rules P-OPEN and P-CASE each add additional information 
to their hypotheses. This information conveys the relationship be- 
tween the atoms in the scrutinee (x 0 bj and x 0 respectively) and the 
atoms in its component(s). In the P-CASE case, even though the 
underlying values are different, their sets of free atoms (and free 



binders and exposable atoms, etc.) are identical, so for the logic's 
purposes, they are equivalent. 

6.1 Odds and ends 

In the let expression, the body subexpression has the same result 
as the expression as a whole, but the value subexpression does 
not. Therefore, in P-LET, the condition C (whose • refers to the 
value subexpression) must be adjusted for use as a hypothesis for 
the body subexpression. Fortunately, the name x refers to the value 
subexpression in question, so a simple [x/-] substitution suffices. 
P may be used unchanged by both subexpressions because it will 
contain no references to -. 3 

A similar issue occurs in P-CALL. The pre- and post-conditions 
of the function (not to be confused with the expression's postcon- 
dition P) are expressed relative to the formal parameters, which 
are meaningless out of context. Because the actual arguments to a 
function invocation are all required to be variable references (rather 
than allowing them to be whole subexpressions), the solution is 
again simple: a simultaneous substitution from the formals to the 
actuals suffices to make the pre- and post-conditions meaningful in 
the caller's context. 

Shadowing amongst Romeo program variables is incompatible 
with the deduction system, because obligations must be able to 
refer to (and distinguish) everything in V by name. This gives rise 
to the requirement that certain x's be fresh for T, P, and P; this 
requirement is easily satisfied by a simple renaming pass prior to 
type and proof checking. 

P-CALL's body hypothesis contains a term representing extra 
information as a consequence of Lemma 6.1, which states that the 



3 This is different from the strategy used in Pure FreshML [13], in which 
postconditions are functions that produce predicates. 
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free atoms in the result of any expression are a subset of the the free 
atoms in the environment in which it is evaluated. A similar term 
appears in the hypothesis for P-Let's body subexpression.The 
proofs of Lemmas 6.1 and 6.1, and the definition of the typesystem 
(including T i-typc mv p), can be found at http : / /hdl . handle . 
net/2047/d20005013. 

Lemma 6.1 (No names made up). 

If t = typeof(r,e) 

and T i-type-cnv p 

k 

andT l- exe (e,p) => v 
then fa(r, v) £ fa env (r, p) 

Proof. By induction on k. □ 

Finally, in P-IfEq, the result of the comparison can be ex- 
pressed in our predicate language; in the branch in which the two 
atoms are equal, we note that their free atom sets (known to be sin- 
gletons) are equal, and in the other branch, we note that their free 
atoms sets are disjoint. 

6.2 Soundness of the Deduction System 

The soundness of the deduction system is expressed in the follow- 
ing theorem. 

Theorem 6.1 (Soundness of the deduction system). 
If r = typeof(r,e) 

and T Ktype-env p 

and r i- p r00 f {H} e {P} 

k 

and T Hero (e,p) => w 
then w * FAULT and T, -:r; p [• ->■ w] 1= H => P 

Proof. By induction on k. □ 

Observe that the theorem is unusually strong: it says that if there 
is any solution to the proof obligations, then the program is non- 
faulting for any suitably-typed environment p. This depends on the 
fact that the generation of proof obligations is deterministic (given 
pre- and post-conditions for the functions in the system). 

6.3 Example 

The Romeo-L code in Figure 2 contains a number of opens, each 
of which potentially can produce a FAULT. However, our deduction 
shows that FAULT will never happen. A complete derivation is too 
large to include here, but we will informally look at two examples. 

First, on line 5, we are opening up a lambda abstraction. This 
"exposes" the lambda's binder (binding it to the variable bv). For- 
tunately, (inj lambda (prod bv, convert(e-body) 10)), the body of the 
open, has no free names from its left-hand child and binds bv in its 
right-hand child, so (regardless of the output of convert) the ex- 
posed name from bv is not free in the result. 

Second, on line 7, we open up the whole let* form, exposing 
all of the names that he exports. We must show that those names 
do not escape this context. When he-some is destructured, we know 
from its type that bv and he-rest, together, export that same set of 
names. 

The value returned from the open is an application, constructed 
on line 13. Its left-hand-side is a lambda, which binds bv in e-rest. 
Therefore, we need to show that the names exported by he-rest are 
bound in e-rest. Fortunately, e-rest is a let* construct (line 12), 
defined to bind the names exported by he-rest in e-body, which 
is exactly what we needed. Therefore, the left-hand side of the 
the function application constructed by convert contains no free 
references that could cause a FAULT. 



Now, we look at the right-hand-side of that application, which 
we generate by calling convert(val-expr) . By Lemma 6.1, we know 
that convert produces a value whose free names are a subset of 
its argument. How do we know what names are free in val-exprl 
We know that, as an expression, it exports nothing, and so has no 
free binders. Any free references in it would have also been free 
in he-some (because it binds no names in the scope of its value 
expression), and therefore free in let-star itself. But let-star is part 
of the environment in which it was opened (on line 7), so, by the 
freshness of newly-exposed names, the names we are worried about 
must be fresh for val-expr. 

A similar argument can be used to verify the safety of the other 
opens. In this example, the programmer didn't need to supply any 
constraints to justify the function calls. In general, constraints are 
necessary for the same reasons as in Pure FreshML [13], and the 
same examples apply. 



7. Related work 

7.1 Statically Specified Binding in Template Macros 

The work of Herman and Wand [7, 8] introduced the idea of 
a static binding specification for a template or pattern-matching 
macro system (like Scheme's syntax-rules). Herman defined 
a language for binding specifications, and gave an algorithm for 
deciding whether a pattern-and-template macro was consistent with 
its binding specification. In practice, however, the complex macros 
in a language like Scheme are often not expressible in a pattern- 
matching system. Romeo provides a path for extending this macro 
system to a procedurally-based one, like Scheme's syntax-case. 

Although our binding annotation system is very similar in 
power to Herman's, we have made some changes in representation. 
The most noticeable is that where Herman and Wand use addresses 
into binary trees of values, we use indices into wide products. 



7.2 Pure FreshML 

The second source for this work is Pure FreshML [13]. Both Pure 
FreshML and Romeo are first-order, side-effect-free languages in 
which a runtime system ensures that introduced names do not es- 
cape their scope, and both provide a proof system that generates 
proof obligations which, if true, guarantee statically that no faults 
will occur. One important difference, important for our intended 
application of macro-expanders, is that Romeo manipulates plain 
S-expression-like data, guided by types, whereas Pure FreshML 
saves type information in values. Our presentation of the language 
and semantics are somewhat different: for example, we have sepa- 
rate constructs for destructuring products (open) and destructuring 
sum types (case). 

The system in [13] leaves the actual language of binding speci- 
fications underdetermined. All the formal development is done in a 
simple system, roughly equivalent to the A-calculus, but one of the 
key examples, normalization by evaluation, is done using the more 
expressive system of Caml [12]. Still, both of these systems are too 
weak to express complex binding constructs. For example, neither 
can express the natural syntax of the let* construct. 

Also, our language provides stronger guarantees than does Pure 
Fresh ML. The primary claim in [13] is that no fresh name escapes 
its scope. This is much weaker than respecting a-equivalence. 
Consider a boolean-valued function that tests its arguments for 
syntactic (not a-) equivalence and returns a boolean. This function 
would not violate the no-escape condition, but of course it violates 
a-equivalence. We conjecture that Pure FreshML does respect in- 
equivalence, but we have not attempted to prove this. 
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7.3 Ott 

Ott [16] is a system for metaprogramming that accepts binding 
specifications with a syntax and semantics similar to ours. How- 
ever, Ott's goals are significantly different. Instead of providing 
a complete, name-aware programming system, Ott generates code 
for use in a theorem-prover, including definitions of types and a 
capture-avoiding substitution function. Ott supports a number of 
theorem-provers and a number of representations for the terms in 
them. Additionally, it can export boilerplate code for OCaml. 

Ott's binding specifications are strictly more expressive than 
ours: effectively, they allow for a single value to export multiple 
sets of names (these sets are designated by "auxiliary functions"), 
which can be bound separately. 

In order to support theorem-provers, Ott includes a definition 
for a-equivalence between its "concrete abstract syntax trees" (the 
equivalent of our values v). Their definition is based on a partial 
equivalence relation which relates two tree positions in a term if 
they are connected by binding (i.e., they would have to be renamed 
together). From this intra-tree relation, it is fairly straightforward to 
extract a notion of a-equivalence: two trees are a-equivalent if both 
(1) their free names match, and (2) the partial equivalence relations 
representing their bound names are identical. It is not clear whether 
this definition would lead to simpler proofs of Theorems like 5.1 
and 5.2. 

7.4 Hygienic Macro Systems in Scheme 

The goals of our work have a great deal in common with the goals 
of hygienic macro systems, like those used in Scheme. There are 
two major problems in dealing with hygienic macro systems in 
Scheme. The first is that there is not yet a widely-accepted, formal, 
implementation-independent definition for the property of being 
hygienic. The algorithm described by Dybvig [3], which is the basis 
for hygienic macro expansion in Scheme, seems "correct" in the 
sense that it tends to behave consistently with the intuition of its 
users. But there is no specification against which it can be proved 
correct. The closest things to a specification, given by Clinger 
[2] and Dybvig [3], are phrased in terms of the bindings inserted 
or introduced by the macro. However, given a macro definition 
(say, in template style), there is no obvious way to tell what those 
bindings should be without reference to the expansion algorithm. 
So this specification remains circular. The use of static binding 
specifications (introduced for this purpose in [7]) provides a more 
rigorous basis for hygiene. 

Secondly, the Dybvig algorithm offers no static guarantees. If 
the designer makes a mistake, it will only be discovered after 
the macro is expanded (and probably after it is used, perhaps 
by an innocent end-user). By contrast, our system offers a static 
guarantee: if the macro definition binds a name incorrectly, the 
error will be detected at macro-definition time by the deduction 
system, not as a runtime fault. 

Even simple macros can cause unexpected results in traditionally- 
hygienic systems. Suppose we want a macro that translates 

(lazy-let ( (x (long-calculation))) 
... x ...) 

into 

(let ( (x (delay (long-calculation)))) 
. . . (force x) . . . ) 

If the body of the lazy-let contains a macro invocation like 
(my-macrol x x) how is the expander for lazy-let to know 
whether either of the x's refers to the x in the declaration of 
the lazy-let? In the absence of a binding specification for 
my-macrol, the only solution is to expand the macros bottom- 
up, which is undesirable for other reasons. If we had a binding 



specification for my-macrol, we could perform this transforma- 
tion without reference to the expansion of my-macrol. 

7.5 Binding in Theorem-Proving Systems 

Ever since the POPLMark Challenge [1], there has been a large 
interest in coding terms with bindings in various proof assistants 
[1 1][14][17]. These works have differing goals than ours; they 
are primarily concerned with proving facts about programs, while 
we are aiming at a usable meta-programming system. They also 
generally depend on representing abstract syntax trees in a pre- 
existing theorem-proving framework like Coq or Agda, whereas 
we are concerned with the complications of concrete syntax (even 
in an S-expression based language). We observe that Pouillard 
and Pottier [15] call a function "well-behaved" iff it preserves in- 
equivalence, and judge a type system to be satisfactory only if the 
functions definable in the system are well-behaved. 

7.6 Extensions to Romeo 

As we have described it, writing programs in Romeo is tedious. 
Programs must be written in an ANF-like style. For example, the 
arguments to function calls must be variables and not more compli- 
cated expressions. 

A second problem is that we have not yet described how the 
truth of T l= H => P is to be determined, and, if it fails, how the 
programmer is supposed to figure out how to fix it. 

These problems are addressed in Muehlboeck's master's thesis 
[10], which presents a more usable front-end for Romeo, called 
Romeo-L. Romeo-L programs are written in a more natural dialect, 
and are automatically translated into the core Romeo we have 
described here. This translation introduces let expressions to avoid 
the need to program in A-normal form, and for each let it infers 
the strongest possible constraint C. Therefore the user need only 
supply constraints for function definitions. In practice, this means 
a vastly reduced annotation burden. 

Romeo-L also includes a connection to the Z3 SMT solver 
[9], which is able to check statements of the form Y i= H => 
P, completing the automated checking of the deduction system. 
Furthermore, it can translate counterexamples provided by Z3 into 
sets of names, so that the user can understand them, and it can 
explain how they violate a constraint either written by the user, or 
implicit in the rules for fresh or open. We hope to report on this in 
a separate paper. 
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